Brief Overview

The goal of this project is to identify the relationship between average income of a country and netflix standard subscription cost as well as number of netflix subscribers for that country while controlling for variables like GDP, inflation and population. Furthermore, we want to divide countries similar to each other in terms of library size, TV shows, movie shows, subscribers while also considering country's average income per capita, population and GDP per capita to look out for interesting patterns among these clusters like how standard subscription price of Netflix varies among these clusters.

Moreover, we also want to see variations in library size, movie shows, tv shows for developing VS developed countries. And, build a binary decision tree classifier that

  1. distinguishes between the standard netflix subscription price as "very low", "low", "high" and "vey high" and,
  2. distinguishes between the developed and developing countries

Image shows variation in Netflix standard cost across different countries.
man walking at a street

Synopsis of findings

  1. Standard subscription price varies significantly among 3 clusters of countries that was discovered
  2. Income of a country affects netflix subscription price in a significant manner.
  3. Income of a country doesn’t affects netflix subscribers in a significant manner.
  4. Developed countries have bigger librazy size, number of tv and movie shows in total.
  5. There is no useful pattern in size of population and library size for a country.
AQI
air pollutants

Synopsis of most important lessons from the effort

  1. It is favorable at times to start with simple questions first to build the momentum.
  2. Start with some descriptive tasks (human-interpretable patterns that describe the data) if you are finding it tough to start with prediction tasks.
  3. It’s better to build several versions of model by iterating multiple times than to spend an enormous amount of time on building one model.